Skip to content

{2023.06}[foss/2023a] TensorFlow v2.15.1 w/ CUDA 12.1.1 + eb_hooks.py #35

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

TopRichard
Copy link
Contributor

@TopRichard TopRichard commented Jul 9, 2025

This PR uses a CUDA-ARM patch to workaround the previously seen error:

"__Int8x8_t" is undefined
  typedef __Int8x8_t int8x8_t;
          ^

/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/nvidia/grace/software/GCCcore/12.3.0/lib/gcc/aarch64-unknown-linux-gnu/12.3.0/include/arm_neon.h(41): error: identifier "__Int16x4_t" is undefined
  typedef __Int16x4_t int16x4_t;
          ^

/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/nvidia/grace/software/GCCcore/12.3.0/lib/gcc/aarch64-unknown-linux-gnu/12.3.0/include/arm_neon.h(42): error: identifier "__Int32x2_t" is undefined
  typedef __Int32x2_t int32x2_t;
          ^

/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/nvidia/grace/software/GCCcore/12.3.0/lib/gcc/aarch64-unknown-linux-gnu/12.3.0/include/arm_neon.h(4
3): error: identifier "__Int64x1_t" is undefined
  typedef __Int64x1_t int64x1_t;
          ^

/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/nvidia/grace/software/GCCcore/12.3.0/lib/gcc/aarch64-unknown-linux-gnu/12.3.0/include/arm_neon.h(4
4): error: identifier "__Float16x4_t" is undefined
  typedef __Float16x4_t float16x4_t;
          ^

/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/nvidia/grace/software/GCCcore/12.3.0/lib/gcc/aarch64-unknown-linux-gnu/12.3.0/include/arm_neon.h(4
5): error: identifier "__Float32x2_t" is undefined
  typedef __Float32x2_t float32x2_t;
          ^

/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/nvidia/grace/software/GCCcore/12.3.0/lib/gcc/aarch64-unknown-linux-gnu/12.3.0/include/arm_neon.h(4
6): error: identifier "__Poly8x8_t" is undefined
  typedef __Poly8x8_t poly8x8_t;
          ^

/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/nvidia/grace/software/GCCcore/12.3.0/lib/gcc/aarch64-unknown-linux-gnu/12.3.0/include/arm_neon.h(4
7): error: identifier "__Poly16x4_t" is undefined
  typedef __Poly16x4_t poly16x4_t;
          ^

/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/nvidia/grace/software/GCCcore/12.3.0/lib/gcc/aarch64-unknown-linux-gnu/12.3.0/include/arm_neon.h(4
8): error: identifier "__Uint8x8_t" is undefined
  typedef __Uint8x8_t uint8x8_t;

/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/nvidia/grace/software/GCCcore/12.3.0/lib/gcc/aarch64-unknown-linux-gnu/12.3.0/include/arm_neon.h(7
93): error: identifier "__builtin_aarch64_raddhnv2di_uuu" is undefined
    return __builtin_aarch64_raddhnv2di_uuu (__a, __b);
           ^

/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/nvidia/grace/software/GCCcore/12.3.0/lib/gcc/aarch64-unknown-linux-gnu/12.3.0/include/arm_neon.h(8
00): error: identifier "__builtin_aarch64_addhn2v8hi" is undefined
    return __builtin_aarch64_addhn2v8hi (__a, __b, __c);
           ^

/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/nvidia/grace/software/GCCcore/12.3.0/lib/gcc/aarch64-unknown-linux-gnu/12.3.0/include/arm_neon.h(8
07): error: identifier "__builtin_aarch64_addhn2v4si" is undefined
    return __builtin_aarch64_addhn2v4si (__a, __b, __c);
           ^

/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/nvidia/grace/software/GCCcore/12.3.0/lib/gcc/aarch64-unknown-linux-gnu/12.3.0/include/arm_neon.h(8
14): error: identifier "__builtin_aarch64_addhn2v2di" is undefined
    return __builtin_aarch64_addhn2v2di (__a, __b, __c);
           ^

/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/nvidia/grace/software/GCCcore/12.3.0/lib/gcc/aarch64-unknown-linux-gnu/12.3.0/include/arm_neon.h(8
21): error: identifier "__builtin_aarch64_addhn2v8hi_uuuu" is undefined
    return __builtin_aarch64_addhn2v8hi_uuuu (__a, __b, __c);
           ^

/cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/nvidia/grace/software/GCCcore/12.3.0/lib/gcc/aarch64-unknown-linux-gnu/12.3.0/include/arm_neon.h(8
28): error: identifier "__builtin_aarch64_addhn2v4si_uuuu" is undefined
    return __builtin_aarch64_addhn2v4si_uuuu (__a, __b, __c);
           ^

Error limit reached.
100 errors detected in the compilation of "tensorflow/core/kernels/reshape_util_gpu.cu.cc".

On x86_64 with cc80:

CPU tests:
Executed 847 out of 847 tests: 847 tests pass.

GPU tests
Executed 189 out of 189 tests: 189 tests pass.

On aarch64 with cc90 :

CPU tests:
Executed 847 out of 847 tests: 847 tests pass.

GPU tests
Executed 189 out of 189 tests: 188 tests pass and 1 fails locally

Create a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead.
2025-07-14 22:05:24.196765: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1929] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 94876 MB memory:  -> device: 0, name: NVIDIA GH200 480GB, pci bus id: 0009:01:00.0, compute capability: 9.0
2025-07-14 22:05:24.216862: W external/local_xla/xla/stream_executor/stream_executor_pimpl.cc:463] Not enough memory to allocate 99485220864 on device 0 within provided limit.  limit=2147483648]
2025-07-14 22:05:24.216882: W external/local_xla/xla/stream_executor/stream_executor_pimpl.cc:463] Not enough memory to allocate 89536700416 on device 0 within provided limit.  limit=2147483648]
2025-07-14 22:05:24.216887: W external/local_xla/xla/stream_executor/stream_executor_pimpl.cc:463] Not enough memory to allocate 80583024640 on device 0 within provided limit.  limit=2147483648]
2025-07-14 22:05:24.216890: W external/local_xla/xla/stream_executor/stream_executor_pimpl.cc:463] Not enough memory to allocate 72524718080 on device 0 within provided limit.  limit=2147483648]
2025-07-14 22:05:24.216892: W external/local_xla/xla/stream_executor/stream_executor_pimpl.cc:463] Not enough memory to allocate 65272246272 on device 0 within provided limit.  limit=2147483648]
2025-07-14 22:05:24.216903: W external/local_xla/xla/stream_executor/stream_executor_pimpl.cc:463] Not enough memory to allocate 58745020416 on device 0 within provided limit.  limit=2147483648]
2025-07-14 22:05:24.216914: W external/local_xla/xla/stream_executor/stream_executor_pimpl.cc:463] Not enough memory to allocate 52870516736 on device 0 within provided limit.  limit=2147483648]
2025-07-14 22:05:24.216916: W external/local_xla/xla/stream_executor/stream_executor_pimpl.cc:463] Not enough memory to allocate 47583465472 on device 0 within provided limit.  limit=2147483648]
2025-07-14 22:05:24.216918: W external/local_xla/xla/stream_executor/stream_executor_pimpl.cc:463] Not enough memory to allocate 42825117696 on device 0 within provided limit.  limit=2147483648]
2025-07-14 22:05:24.216921: W external/local_xla/xla/stream_executor/stream_executor_pimpl.cc:463] Not enough memory to allocate 38542606336 on device 0 within provided limit.  limit=2147483648]
2025-07-14 22:05:24.216923: W external/local_xla/xla/stream_executor/stream_executor_pimpl.cc:463] Not enough memory to allocate 34688344064 on device 0 within provided limit.  limit=2147483648]
2025-07-14 22:05:24.216925: W external/local_xla/xla/stream_executor/stream_executor_pimpl.cc:463] Not enough memory to allocate 31219509248 on device 0 within provided limit.  limit=2147483648]
2025-07-14 22:05:24.216927: W external/local_xla/xla/stream_executor/stream_executor_pimpl.cc:463] Not enough memory to allocate 28097558528 on device 0 within provided limit.  limit=2147483648]
2025-07-14 22:05:24.216929: W external/local_xla/xla/stream_executor/stream_executor_pimpl.cc:463] Not enough memory to allocate 25287802880 on device 0 within provided limit.  limit=2147483648]
2025-07-14 22:05:24.216931: W external/local_xla/xla/stream_executor/stream_executor_pimpl.cc:463] Not enough memory to allocate 22759022592 on device 0 within provided limit.  limit=2147483648]
2025-07-14 22:05:24.216934: W external/local_xla/xla/stream_executor/stream_executor_pimpl.cc:463] Not enough memory to allocate 20483119104 on device 0 within provided limit.  limit=2147483648]
2025-07-14 22:05:24.216981: W external/local_xla/xla/stream_executor/stream_executor_pimpl.cc:463] Not enough memory to allocate 2241240576 on device 0 within provided limit.  limit=2147483648]
INFO:tensorflow:time(__main__.SparseToDenseTest.test2d): 0.62s
I0714 22:05:24.369109 84022298500416 test_util.py:2574] time(__main__.SparseToDenseTest.test2d): 0.62s
[       OK ] SparseToDenseTest.test2d
[ RUN      ] SparseToDenseTest.test3d
INFO:tensorflow:time(__main__.SparseToDenseTest.test3d): 0.0s
I0714 22:05:24.372016 84022298500416 test_util.py:2574] time(__main__.SparseToDenseTest.test3d): 0.0s
[       OK ] SparseToDenseTest.test3d
[ RUN      ] SparseToDenseTest.testBadDefault
2025-07-14 22:05:24.374394: W tensorflow/core/framework/op_kernel.cc:1839] OP_REQUIRES failed at sparse_to_dense_op.cc:218 : INVALID_ARGUMENT: default_value should be a scalar.
INFO:tensorflow:time(__main__.SparseToDenseTest.testBadDefault): 0.0s
I0714 22:05:24.374547 84022298500416 test_util.py:2574] time(__main__.SparseToDenseTest.testBadDefault): 0.0s
[       OK ] SparseToDenseTest.testBadDefault
[ RUN      ] SparseToDenseTest.testBadNumValues
2025-07-14 22:05:24.376781: W tensorflow/core/framework/op_kernel.cc:1839] OP_REQUIRES failed at sparse_to_dense_op.cc:218 : INVALID_ARGUMENT: sparse_values has incorrect shape [3], should be [] or [2]
INFO:tensorflow:time(__main__.SparseToDenseTest.testBadNumValues): 0.0s
I0714 22:05:24.376892 84022298500416 test_util.py:2574] time(__main__.SparseToDenseTest.testBadNumValues): 0.0s
[       OK ] SparseToDenseTest.testBadNumValues
[ RUN      ] SparseToDenseTest.testBadShape
2025-07-14 22:05:24.378949: W tensorflow/core/framework/op_kernel.cc:1839] OP_REQUIRES failed at sparse_to_dense_op.cc:218 : INVALID_ARGUMENT: output_shape must be rank 1, got shape [2,1]
INFO:tensorflow:time(__main__.SparseToDenseTest.testBadShape): 0.0s
I0714 22:05:24.379058 84022298500416 test_util.py:2574] time(__main__.SparseToDenseTest.testBadShape): 0.0s
[       OK ] SparseToDenseTest.testBadShape
[ RUN      ] SparseToDenseTest.testBadValue
2025-07-14 22:05:24.381188: W tensorflow/core/framework/op_kernel.cc:1839] OP_REQUIRES failed at sparse_to_dense_op.cc:218 : INVALID_ARGUMENT: sparse_values has incorrect shape [2,1], should be [] or [2]
INFO:tensorflow:time(__main__.SparseToDenseTest.testBadValue): 0.0s
I0714 22:05:24.381294 84022298500416 test_util.py:2574] time(__main__.SparseToDenseTest.testBadValue): 0.0s
2025-07-14 22:05:24.378949: W tensorflow/core/framework/op_kernel.cc:1839] OP_REQUIRES failed at sparse_to_dense_op.cc:218 : INVALID_ARGUMENT: output_shape must be rank 1, got shape [2,1]
INFO:tensorflow:time(__main__.SparseToDenseTest.testBadShape): 0.0s
I0714 22:05:24.379058 84022298500416 test_util.py:2574] time(__main__.SparseToDenseTest.testBadShape): 0.0s
[       OK ] SparseToDenseTest.testBadShape
[ RUN      ] SparseToDenseTest.testBadValue
2025-07-14 22:05:24.381188: W tensorflow/core/framework/op_kernel.cc:1839] OP_REQUIRES failed at sparse_to_dense_op.cc:218 : INVALID_ARGUMENT: sparse_values has incorrect shape [2,1], should be [] or [2]
INFO:tensorflow:time(__main__.SparseToDenseTest.testBadValue): 0.0s
I0714 22:05:24.381294 84022298500416 test_util.py:2574] time(__main__.SparseToDenseTest.testBadValue): 0.0s
[       OK ] SparseToDenseTest.testBadValue
[ RUN      ] SparseToDenseTest.testComplex
INFO:tensorflow:time(__main__.SparseToDenseTest.testComplex): 0.1s
I0714 22:05:24.478588 84022298500416 test_util.py:2574] time(__main__.SparseToDenseTest.testComplex): 0.1s
[       OK ] SparseToDenseTest.testComplex
[ RUN      ] SparseToDenseTest.testEmptyNonZeros
INFO:tensorflow:time(__main__.SparseToDenseTest.testEmptyNonZeros): 0.0s
I0714 22:05:24.481788 84022298500416 test_util.py:2574] time(__main__.SparseToDenseTest.testEmptyNonZeros): 0.0s
[       OK ] SparseToDenseTest.testEmptyNonZeros
[ RUN      ] SparseToDenseTest.testFloatTypes0 (tf.bfloat16)
INFO:tensorflow:time(__main__.SparseToDenseTest.testFloatTypes0 (tf.bfloat16)): 0.0s
I0714 22:05:24.485352 84022298500416 test_util.py:2574] time(__main__.SparseToDenseTest.testFloatTypes0 (tf.bfloat16)): 0.0s
[       OK ] SparseToDenseTest.testFloatTypes0 (tf.bfloat16)
[ RUN      ] SparseToDenseTest.testFloatTypes1 (tf.float16)
Fatal Python error: Segmentation fault

@TopRichard TopRichard marked this pull request as draft July 9, 2025 18:52
@TopRichard
Copy link
Contributor Author

bot: build instance:eessi-bot-surf repo:eessi.io-2023.06-software arch:zen4 accel:nvidia/cc90

@eessi-bot-surf
Copy link

eessi-bot-surf bot commented Jul 9, 2025

New job on instance eessi-bot-surf for CPU micro-architecture x86_64-amd-zen4 and accelerator nvidia/cc90 for repository eessi.io-2023.06-software in job dir /projects/eessibot/eessi-bot-surf/jobs/2025.07/pr_35/13085596

date job status comment
Jul 09 19:03:52 UTC 2025 submitted job id 13085596 will be eligible to start in about 20 seconds
Jul 09 19:03:59 UTC 2025 received job awaits launch by Slurm scheduler
Jul 09 19:04:22 UTC 2025 running job 13085596 is running
Jul 09 19:06:05 UTC 2025 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-13085596.out
✅ no message matching FATAL:
❌ found message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
❌ no message matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-amd-zen4-17520879110.tar.gzsize: 0 MiB (18163 bytes)
entries: 1
modules under 2023.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc90/modules/all
no module files in tarball
software under 2023.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc90/software
no software packages in tarball
other under 2023.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc90
2023.06/init/easybuild/eb_hooks.py
Jul 09 19:06:05 UTC 2025 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ SKIP ] (1/8) Skipping GPU test : only 1 GPU available for this test case
[ SKIP ] (2/8) Skipping GPU test : only 1 GPU available for this test case
[ SKIP ] (3/8) Skipping GPU test : only 1 GPU available for this test case
[ SKIP ] (4/8) Skipping GPU test : only 1 GPU available for this test case
[ SKIP ] (5/8) Skipping test : 1 GPU(s) available for this test case, need exactly 2
[ SKIP ] (6/8) Skipping test : 1 GPU(s) available for this test case, need exactly 2
[ SKIP ] (7/8) Skipping test : 1 GPU(s) available for this test case, need exactly 2
[ SKIP ] (8/8) Skipping test : 1 GPU(s) available for this test case, need exactly 2
[ PASSED ] Ran 0/8 test case(s) from 8 check(s) (0 failure(s), 8 skipped, 0 aborted)
Details
✅ job output file slurm-13085596.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@TopRichard
Copy link
Contributor Author

bot: build instance:eessi-bot-surf repo:eessi.io-2023.06-software arch:zen4 accel:nvidia/cc90

@eessi-bot-surf
Copy link

eessi-bot-surf bot commented Jul 9, 2025

New job on instance eessi-bot-surf for CPU micro-architecture x86_64-amd-zen4 and accelerator nvidia/cc90 for repository eessi.io-2023.06-software in job dir /projects/eessibot/eessi-bot-surf/jobs/2025.07/pr_35/13085903

date job status comment
Jul 09 19:24:15 UTC 2025 submitted job id 13085903 will be eligible to start in about 20 seconds
Jul 09 19:24:21 UTC 2025 received job awaits launch by Slurm scheduler
Jul 09 19:24:43 UTC 2025 running job 13085903 is running
Jul 09 19:26:36 UTC 2025 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-13085903.out
✅ no message matching FATAL:
❌ found message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
❌ no message matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-amd-zen4-17520891370.tar.gzsize: 0 MiB (18104 bytes)
entries: 1
modules under 2023.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc90/modules/all
no module files in tarball
software under 2023.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc90/software
no software packages in tarball
other under 2023.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc90
2023.06/init/easybuild/eb_hooks.py
Jul 09 19:26:36 UTC 2025 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ SKIP ] (1/8) Skipping GPU test : only 1 GPU available for this test case
[ SKIP ] (2/8) Skipping GPU test : only 1 GPU available for this test case
[ SKIP ] (3/8) Skipping GPU test : only 1 GPU available for this test case
[ SKIP ] (4/8) Skipping GPU test : only 1 GPU available for this test case
[ SKIP ] (5/8) Skipping test : 1 GPU(s) available for this test case, need exactly 2
[ SKIP ] (6/8) Skipping test : 1 GPU(s) available for this test case, need exactly 2
[ SKIP ] (7/8) Skipping test : 1 GPU(s) available for this test case, need exactly 2
[ SKIP ] (8/8) Skipping test : 1 GPU(s) available for this test case, need exactly 2
[ PASSED ] Ran 0/8 test case(s) from 8 check(s) (0 failure(s), 8 skipped, 0 aborted)
Details
✅ job output file slurm-13085903.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@TopRichard
Copy link
Contributor Author

bot: build instance:eessi-bot-surf repo:eessi.io-2023.06-software arch:zen4 accel:nvidia/cc90

@eessi-bot-surf
Copy link

eessi-bot-surf bot commented Jul 9, 2025

New job on instance eessi-bot-surf for CPU micro-architecture x86_64-amd-zen4 and accelerator nvidia/cc90 for repository eessi.io-2023.06-software in job dir /projects/eessibot/eessi-bot-surf/jobs/2025.07/pr_35/13086265

date job status comment
Jul 09 19:40:10 UTC 2025 submitted job id 13086265 will be eligible to start in about 20 seconds
Jul 09 19:40:22 UTC 2025 received job awaits launch by Slurm scheduler
Jul 09 19:40:35 UTC 2025 running job 13086265 is running
Jul 09 19:42:40 UTC 2025 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-13086265.out
✅ no message matching FATAL:
❌ found message matching ERROR:
❌ found message matching FAILED:
❌ found message matching required modules missing:
❌ no message matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-amd-zen4-17520901000.tar.gzsize: 0 MiB (18099 bytes)
entries: 1
modules under 2023.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc90/modules/all
no module files in tarball
software under 2023.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc90/software
no software packages in tarball
other under 2023.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc90
2023.06/init/easybuild/eb_hooks.py
Jul 09 19:42:40 UTC 2025 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ SKIP ] (1/8) Skipping GPU test : only 1 GPU available for this test case
[ SKIP ] (2/8) Skipping GPU test : only 1 GPU available for this test case
[ SKIP ] (3/8) Skipping GPU test : only 1 GPU available for this test case
[ SKIP ] (4/8) Skipping GPU test : only 1 GPU available for this test case
[ SKIP ] (5/8) Skipping test : 1 GPU(s) available for this test case, need exactly 2
[ SKIP ] (6/8) Skipping test : 1 GPU(s) available for this test case, need exactly 2
[ SKIP ] (7/8) Skipping test : 1 GPU(s) available for this test case, need exactly 2
[ SKIP ] (8/8) Skipping test : 1 GPU(s) available for this test case, need exactly 2
[ PASSED ] Ran 0/8 test case(s) from 8 check(s) (0 failure(s), 8 skipped, 0 aborted)
Details
✅ job output file slurm-13086265.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@TopRichard
Copy link
Contributor Author

bot: build instance:eessi-bot-surf repo:eessi.io-2023.06-software arch:zen4 accel:nvidia/cc90

@eessi-bot-surf
Copy link

eessi-bot-surf bot commented Jul 9, 2025

New job on instance eessi-bot-surf for CPU micro-architecture x86_64-amd-zen4 and accelerator nvidia/cc90 for repository eessi.io-2023.06-software in job dir /projects/eessibot/eessi-bot-surf/jobs/2025.07/pr_35/13086405

date job status comment
Jul 09 20:24:47 UTC 2025 submitted job id 13086405 will be eligible to start in about 20 seconds
Jul 09 20:24:58 UTC 2025 received job awaits launch by Slurm scheduler
Jul 09 20:25:22 UTC 2025 running job 13086405 is running
Jul 09 20:27:16 UTC 2025 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-13086405.out
✅ no message matching FATAL:
❌ found message matching ERROR:
❌ found message matching FAILED:
❌ found message matching required modules missing:
❌ no message matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-amd-zen4-17520927760.tar.gzsize: 0 MiB (18104 bytes)
entries: 1
modules under 2023.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc90/modules/all
no module files in tarball
software under 2023.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc90/software
no software packages in tarball
other under 2023.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc90
2023.06/init/easybuild/eb_hooks.py
Jul 09 20:27:16 UTC 2025 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ SKIP ] (1/8) Skipping GPU test : only 1 GPU available for this test case
[ SKIP ] (2/8) Skipping GPU test : only 1 GPU available for this test case
[ SKIP ] (3/8) Skipping GPU test : only 1 GPU available for this test case
[ SKIP ] (4/8) Skipping GPU test : only 1 GPU available for this test case
[ SKIP ] (5/8) Skipping test : 1 GPU(s) available for this test case, need exactly 2
[ SKIP ] (6/8) Skipping test : 1 GPU(s) available for this test case, need exactly 2
[ SKIP ] (7/8) Skipping test : 1 GPU(s) available for this test case, need exactly 2
[ SKIP ] (8/8) Skipping test : 1 GPU(s) available for this test case, need exactly 2
[ PASSED ] Ran 0/8 test case(s) from 8 check(s) (0 failure(s), 8 skipped, 0 aborted)
Details
✅ job output file slurm-13086405.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@TopRichard
Copy link
Contributor Author

bot: build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws arch:x86_64/generic

@eessi-bot-aws
Copy link

eessi-bot-aws bot commented Jul 9, 2025

New job on instance eessi-bot-mc-aws for CPU micro-architecture x86_64-generic for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2025.07/pr_35/75015

date job status comment
Jul 09 20:40:10 UTC 2025 submitted job id 75015 awaits release by job manager
Jul 09 20:40:56 UTC 2025 released job awaits launch by Slurm scheduler
Jul 09 20:45:58 UTC 2025 running job 75015 is running
Jul 09 20:51:03 UTC 2025 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-75015.out
✅ no message matching FATAL:
❌ found message matching ERROR:
❌ found message matching FAILED:
❌ found message matching required modules missing:
❌ no message matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-generic-17520940310.tar.gzsize: 0 MiB (18096 bytes)
entries: 1
modules under 2023.06/software/linux/x86_64/generic/modules/all
no module files in tarball
software under 2023.06/software/linux/x86_64/generic/software
no software packages in tarball
other under 2023.06/software/linux/x86_64/generic
2023.06/init/easybuild/eb_hooks.py
Jul 09 20:51:03 UTC 2025 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ OK ] ( 1/10) EESSI_LAMMPS_lj %device_type=cpu %module_name=LAMMPS/29Aug2024-foss-2023b-kokkos %scale=1_node /aeb2d9df @BotBuildTests:x86_64_generic+default
P: perf: 373.396 timesteps/s (r:0, l:None, u:None)
[ OK ] ( 2/10) EESSI_LAMMPS_lj %device_type=cpu %module_name=LAMMPS/2Aug2023_update2-foss-2023a-kokkos %scale=1_node /04ff9ece @BotBuildTests:x86_64_generic+default
P: perf: 387.924 timesteps/s (r:0, l:None, u:None)
[ OK ] ( 3/10) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %scale=1_node %device_type=cpu /775175bf @BotBuildTests:x86_64_generic+default
P: latency: 2.71 us (r:0, l:None, u:None)
[ OK ] ( 4/10) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %scale=1_node %device_type=cpu /52707c40 @BotBuildTests:x86_64_generic+default
P: latency: 2.72 us (r:0, l:None, u:None)
[ OK ] ( 5/10) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %scale=1_node %device_type=cpu /b1aacda9 @BotBuildTests:x86_64_generic+default
P: latency: 4.53 us (r:0, l:None, u:None)
[ OK ] ( 6/10) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %scale=1_node %device_type=cpu /c6bad193 @BotBuildTests:x86_64_generic+default
P: latency: 4.55 us (r:0, l:None, u:None)
[ OK ] ( 7/10) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %scale=1_node /15cad6c4 @BotBuildTests:x86_64_generic+default
P: latency: 0.68 us (r:0, l:None, u:None)
[ OK ] ( 8/10) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %scale=1_node /6672deda @BotBuildTests:x86_64_generic+default
P: latency: 0.72 us (r:0, l:None, u:None)
[ OK ] ( 9/10) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %scale=1_node /2a9a47b1 @BotBuildTests:x86_64_generic+default
P: bandwidth: 12428.62 MB/s (r:0, l:None, u:None)
[ OK ] (10/10) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %scale=1_node /1b24ab8e @BotBuildTests:x86_64_generic+default
P: bandwidth: 11405.27 MB/s (r:0, l:None, u:None)
[ PASSED ] Ran 10/10 test case(s) from 10 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-75015.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@TopRichard
Copy link
Contributor Author

bot: build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws arch:x86_64/generic accel:nvidia/cc90

@eessi-bot-aws
Copy link

eessi-bot-aws bot commented Jul 9, 2025

New job on instance eessi-bot-mc-aws for CPU micro-architecture x86_64-generic and accelerator nvidia/cc90 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2025.07/pr_35/75016

date job status comment
Jul 09 20:52:19 UTC 2025 submitted job id 75016 awaits release by job manager
Jul 09 20:53:06 UTC 2025 released job awaits launch by Slurm scheduler
Jul 09 20:54:08 UTC 2025 running job 75016 is running
Jul 09 20:55:09 UTC 2025 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-75016.out
✅ no message matching FATAL:
❌ found message matching ERROR:
❌ found message matching FAILED:
❌ found message matching required modules missing:
❌ no message matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-generic-17520944580.tar.gzsize: 0 MiB (18095 bytes)
entries: 1
modules under 2023.06/software/linux/x86_64/generic/accel/nvidia/cc90/modules/all
no module files in tarball
software under 2023.06/software/linux/x86_64/generic/accel/nvidia/cc90/software
no software packages in tarball
other under 2023.06/software/linux/x86_64/generic/accel/nvidia/cc90
2023.06/init/easybuild/eb_hooks.py
Jul 09 20:55:09 UTC 2025 test result
😢 FAILURE (click triangle for details)
Reason
EESSI test suite was not run, test step itself failed to execute.
Details
✅ job output file slurm-75016.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@TopRichard
Copy link
Contributor Author

The failure is:

== Summary:
   * [FAILED]  cuDNN/8.9.2.26-CUDA-12.1.1
   * [SKIPPED] TensorFlow/2.15.1-foss-2023a-CUDA-12.1.1
0:00:00  0 out of 2 easyconfigs doneERROR: Installation of cuDNN-8.9.2.26-CUDA-12.1.1.eb failed: "The End User License Agreement (EUL
A) for cuDNN is currently not accepted!\n(see https://docs.nvidia.com/deeplearning/cudnn/latest/reference/eula.html for more informat
ion)\nYou should either:\n- add --accept-eula-for=cuDNN to the 'eb' command;\n- update your EasyBuild configuration to always accept 
the EULA for cuDNN;\n- add 'accept_eula = True' to the easyconfig file you are using;\n"
Last EasyBuild log file copied from /tmp/eb-0sv_9why/easybuild-_r7fptuy.log to /eessi_bot_job
EasyBuild log file /tmp/eb-0sv_9why/easybuild-_r7fptuy.log copied to /project/def-users/SHARED/build-logs/jobs/75016/easybuild-_r7fpt
uy.log (with context appended)

@TopRichard
Copy link
Contributor Author

bot: build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws arch:x86_64/generic accel:nvidia/cc90

@eessi-bot-aws
Copy link

eessi-bot-aws bot commented Jul 10, 2025

New job on instance eessi-bot-mc-aws for CPU micro-architecture x86_64-generic and accelerator nvidia/cc90 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2025.07/pr_35/75225

date job status comment
Jul 10 11:05:14 UTC 2025 submitted job id 75225 awaits release by job manager
Jul 10 11:05:19 UTC 2025 released job awaits launch by Slurm scheduler
Jul 10 11:11:21 UTC 2025 running job 75225 is running
Jul 10 11:13:23 UTC 2025 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-75225.out
✅ no message matching FATAL:
❌ found message matching ERROR:
❌ found message matching FAILED:
❌ found message matching required modules missing:
❌ no message matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-generic-17521459190.tar.gzsize: 0 MiB (18096 bytes)
entries: 1
modules under 2023.06/software/linux/x86_64/generic/accel/nvidia/cc90/modules/all
no module files in tarball
software under 2023.06/software/linux/x86_64/generic/accel/nvidia/cc90/software
no software packages in tarball
other under 2023.06/software/linux/x86_64/generic/accel/nvidia/cc90
2023.06/init/easybuild/eb_hooks.py
Jul 10 11:13:23 UTC 2025 test result
😢 FAILURE (click triangle for details)
Reason
EESSI test suite was not run, test step itself failed to execute.
Details
✅ job output file slurm-75225.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@TopRichard
Copy link
Contributor Author

bot: build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws arch:x86_64/amd/zen3 accel:nvidia/cc80

@eessi-bot-aws
Copy link

eessi-bot-aws bot commented Jul 10, 2025

New job on instance eessi-bot-mc-aws for CPU micro-architecture x86_64-amd-zen3 and accelerator nvidia/cc80 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2025.07/pr_35/75226

date job status comment
Jul 10 11:29:38 UTC 2025 submitted job id 75226 awaits release by job manager
Jul 10 11:30:27 UTC 2025 released job awaits launch by Slurm scheduler
Jul 10 11:35:29 UTC 2025 running job 75226 is running
Jul 10 15:55:18 UTC 2025 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-75226.out
✅ no message matching FATAL:
❌ found message matching ERROR:
❌ found message matching FAILED:
❌ found message matching required modules missing:
❌ no message matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-amd-zen3-17521618690.tar.gzsize: 0 MiB (18096 bytes)
entries: 1
modules under 2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80/modules/all
no module files in tarball
software under 2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80/software
no software packages in tarball
other under 2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80
2023.06/init/easybuild/eb_hooks.py
Jul 10 15:55:18 UTC 2025 test result
😢 FAILURE (click triangle for details)
Reason
EESSI test suite was not run, test step itself failed to execute.
Details
✅ job output file slurm-75226.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@TopRichard
Copy link
Contributor Author

bot: help

@eessi-bot-aws
Copy link

eessi-bot-aws bot commented Jul 10, 2025

Updates by the bot instance eessi-bot-mc-aws (click for details)
  • received bot command help from TopRichard

    • expanded format: help
  • handling command help resulted in:
    How to send commands to bot instances

    • Commands must be sent with a new comment (edits of existing comments are ignored).
    • A comment may contain multiple commands, one per line.
    • Every command begins at the start of a line and has the syntax bot: COMMAND [ARGUMENTS]*
    • Currently supported COMMANDs are: help, build, show_config, status

    For more information, see https://www.eessi.io/docs/bot

@eessi-bot-surf
Copy link

eessi-bot-surf bot commented Jul 10, 2025

Updates by the bot instance eessi-bot-surf (click for details)
  • received bot command help from TopRichard

    • expanded format: help
  • handling command help resulted in:
    How to send commands to bot instances

    • Commands must be sent with a new comment (edits of existing comments are ignored).
    • A comment may contain multiple commands, one per line.
    • Every command begins at the start of a line and has the syntax bot: COMMAND [ARGUMENTS]*
    • Currently supported COMMANDs are: help, build, show_config, status

    For more information, see https://www.eessi.io/docs/bot

@eessi-bot-deucalion
Copy link

eessi-bot-deucalion bot commented Jul 10, 2025

Updates by the bot instance eessi-bot-deucalion (click for details)
  • received bot command help from TopRichard

    • expanded format: help
  • handling command help resulted in:
    How to send commands to bot instances

    • Commands must be sent with a new comment (edits of existing comments are ignored).
    • A comment may contain multiple commands, one per line.
    • Every command begins at the start of a line and has the syntax bot: COMMAND [ARGUMENTS]*
    • Currently supported COMMANDs are: help, build, show_config, status

    For more information, see https://www.eessi.io/docs/bot

@eessi-bot-jsc
Copy link

eessi-bot-jsc bot commented Jul 10, 2025

Updates by the bot instance eessi-bot-jsc (click for details)
  • received bot command help from TopRichard

    • expanded format: help
  • handling command help resulted in:
    How to send commands to bot instances

    • Commands must be sent with a new comment (edits of existing comments are ignored).
    • A comment may contain multiple commands, one per line.
    • Every command begins at the start of a line and has the syntax bot: COMMAND [ARGUMENTS]*
    • Currently supported COMMANDs are: help, build, show_config, status

    For more information, see https://www.eessi.io/docs/bot

@TopRichard
Copy link
Contributor Author

bot: help

@eessi-bot-aws
Copy link

eessi-bot-aws bot commented Jul 10, 2025

Updates by the bot instance eessi-bot-mc-aws (click for details)
  • received bot command help from TopRichard

    • expanded format: help
  • handling command help resulted in:
    How to send commands to bot instances

    • Commands must be sent with a new comment (edits of existing comments are ignored).
    • A comment may contain multiple commands, one per line.
    • Every command begins at the start of a line and has the syntax bot: COMMAND [ARGUMENTS]*
    • Currently supported COMMANDs are: help, build, show_config, status

    For more information, see https://www.eessi.io/docs/bot

@eessi-bot-surf
Copy link

eessi-bot-surf bot commented Jul 10, 2025

Updates by the bot instance eessi-bot-surf (click for details)
  • received bot command help from TopRichard

    • expanded format: help
  • handling command help resulted in:
    How to send commands to bot instances

    • Commands must be sent with a new comment (edits of existing comments are ignored).
    • A comment may contain multiple commands, one per line.
    • Every command begins at the start of a line and has the syntax bot: COMMAND [ARGUMENTS]*
    • Currently supported COMMANDs are: help, build, show_config, status

    For more information, see https://www.eessi.io/docs/bot

@eessi-bot-deucalion
Copy link

eessi-bot-deucalion bot commented Jul 10, 2025

Updates by the bot instance eessi-bot-deucalion (click for details)
  • received bot command help from TopRichard

    • expanded format: help
  • handling command help resulted in:
    How to send commands to bot instances

    • Commands must be sent with a new comment (edits of existing comments are ignored).
    • A comment may contain multiple commands, one per line.
    • Every command begins at the start of a line and has the syntax bot: COMMAND [ARGUMENTS]*
    • Currently supported COMMANDs are: help, build, show_config, status

    For more information, see https://www.eessi.io/docs/bot

@eessi-bot-jsc
Copy link

eessi-bot-jsc bot commented Jul 10, 2025

Updates by the bot instance eessi-bot-jsc (click for details)
  • received bot command help from TopRichard

    • expanded format: help
  • handling command help resulted in:
    How to send commands to bot instances

    • Commands must be sent with a new comment (edits of existing comments are ignored).
    • A comment may contain multiple commands, one per line.
    • Every command begins at the start of a line and has the syntax bot: COMMAND [ARGUMENTS]*
    • Currently supported COMMANDs are: help, build, show_config, status

    For more information, see https://www.eessi.io/docs/bot

@gpu-bot-ugent
Copy link

gpu-bot-ugent bot commented Jul 11, 2025

Updates by the bot instance eessi-bot-vsc-ugent (click for details)
  • received bot command build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws arch:x86_64/amd/zen3 accel:nvidia/cc80 from TopRichard

    • expanded format: build repository:eessi.io-2023.06-software instance:eessi-bot-mc-aws architecture:x86_64/amd/zen3 accelerator:nvidia/cc80
  • received bot command build instance:eessi-bot-vsc-ugent repo:eessi.io-2023.06-software arch:zen3 accel:nvidia/cc80 from TopRichard

    • expanded format: build instance:eessi-bot-vsc-ugent repository:eessi.io-2023.06-software architecture:zen3 accelerator:nvidia/cc80
  • handling command build repository:eessi.io-2023.06-software instance:eessi-bot-mc-aws architecture:x86_64/amd/zen3 accelerator:nvidia/cc80 resulted in:

    • account TopRichard has NO permission to submit build jobs
  • handling command build instance:eessi-bot-vsc-ugent repository:eessi.io-2023.06-software architecture:zen3 accelerator:nvidia/cc80 resulted in:

    • account TopRichard has NO permission to submit build jobs

@eessi-bot-aws
Copy link

eessi-bot-aws bot commented Jul 11, 2025

New job on instance eessi-bot-mc-aws for CPU micro-architecture x86_64-amd-zen3 and accelerator nvidia/cc80 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2025.07/pr_35/75437

date job status comment
Jul 11 08:35:13 UTC 2025 submitted job id 75437 awaits release by job manager
Jul 11 08:35:58 UTC 2025 released job awaits launch by Slurm scheduler
Jul 11 08:41:00 UTC 2025 running job 75437 is running
Jul 11 12:56:42 UTC 2025 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-75437.out
✅ no message matching FATAL:
❌ found message matching ERROR:
❌ found message matching FAILED:
❌ found message matching required modules missing:
❌ no message matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-amd-zen3-17522375100.tar.gzsize: 0 MiB (18146 bytes)
entries: 1
modules under 2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80/modules/all
no module files in tarball
software under 2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80/software
no software packages in tarball
other under 2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80
2023.06/init/easybuild/eb_hooks.py
Jul 11 12:56:42 UTC 2025 test result
😢 FAILURE (click triangle for details)
Reason
EESSI test suite was not run, test step itself failed to execute.
Details
✅ job output file slurm-75437.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@gpu-bot-ugent
Copy link

gpu-bot-ugent bot commented Jul 11, 2025

Label bot:build has been set by user TopRichard, but this person does not have permission to trigger builds

1 similar comment
@gpu-bot-ugent
Copy link

gpu-bot-ugent bot commented Jul 11, 2025

Label bot:build has been set by user TopRichard, but this person does not have permission to trigger builds

@laraPPr
Copy link
Contributor

laraPPr commented Jul 11, 2025

bot: build instance:eessi-bot-vsc-ugent repo:eessi.io-2023.06-software arch:zen3 accel:nvidia/cc80

@gpu-bot-ugent
Copy link

gpu-bot-ugent bot commented Jul 11, 2025

Updates by the bot instance eessi-bot-vsc-ugent (click for details)
  • received bot command build instance:eessi-bot-vsc-ugent repo:eessi.io-2023.06-software arch:zen3 accel:nvidia/cc80 from laraPPr

    • expanded format: build instance:eessi-bot-vsc-ugent repository:eessi.io-2023.06-software architecture:zen3 accelerator:nvidia/cc80
  • handling command build instance:eessi-bot-vsc-ugent repository:eessi.io-2023.06-software architecture:zen3 accelerator:nvidia/cc80 resulted in:

@gpu-bot-ugent
Copy link

gpu-bot-ugent bot commented Jul 11, 2025

New job on instance eessi-bot-vsc-ugent for CPU micro-architecture x86_64-amd-zen3 and accelerator nvidia/cc80 for repository eessi.io-2023.06-software in job dir /scratch/gent/vo/002/gvo00211/SHARED/jobs/2025.07/pr_35/15515880

date job status comment
Jul 11 08:42:56 UTC 2025 submitted job id 15515880 awaits release by job manager
Jul 11 08:44:34 UTC 2025 released job awaits launch by Slurm scheduler
Jul 11 08:46:38 UTC 2025 running job 15515880 is running
Jul 11 12:32:38 UTC 2025 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-15515880.out
✅ no message matching FATAL:
❌ found message matching ERROR:
❌ found message matching FAILED:
❌ found message matching required modules missing:
❌ no message matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-amd-zen3-17522370050.tar.gzsize: 0 MiB (18162 bytes)
entries: 1
modules under 2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80/modules/all
no module files in tarball
software under 2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80/software
no software packages in tarball
other under 2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80
2023.06/init/easybuild/eb_hooks.py
Jul 11 12:32:38 UTC 2025 test result
😢 FAILURE (click triangle for details)
Reason
EESSI test suite was not run, test step itself failed to execute.
Details
✅ job output file slurm-15515880.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@TopRichard
Copy link
Contributor Author

bot: build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws arch:x86_64/amd/zen3 accel:nvidia/cc80

@gpu-bot-ugent
Copy link

gpu-bot-ugent bot commented Jul 11, 2025

Updates by the bot instance eessi-bot-vsc-ugent (click for details)
  • received bot command build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws arch:x86_64/amd/zen3 accel:nvidia/cc80 from TopRichard

    • expanded format: build repository:eessi.io-2023.06-software instance:eessi-bot-mc-aws architecture:x86_64/amd/zen3 accelerator:nvidia/cc80
  • handling command build repository:eessi.io-2023.06-software instance:eessi-bot-mc-aws architecture:x86_64/amd/zen3 accelerator:nvidia/cc80 resulted in:

    • account TopRichard has NO permission to submit build jobs

@eessi-bot-aws
Copy link

eessi-bot-aws bot commented Jul 11, 2025

New job on instance eessi-bot-mc-aws for CPU micro-architecture x86_64-amd-zen3 and accelerator nvidia/cc80 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2025.07/pr_35/75438

date job status comment
Jul 11 12:35:43 UTC 2025 submitted job id 75438 awaits release by job manager
Jul 11 12:36:09 UTC 2025 released job awaits launch by Slurm scheduler
Jul 11 12:41:20 UTC 2025 running job 75438 is running
Jul 11 12:43:25 UTC 2025 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-75438.out
✅ no message matching FATAL:
❌ found message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
❌ no message matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-amd-zen3-17522377150.tar.gzsize: 0 MiB (18142 bytes)
entries: 1
modules under 2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80/modules/all
no module files in tarball
software under 2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80/software
no software packages in tarball
other under 2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80
2023.06/init/easybuild/eb_hooks.py
Jul 11 12:43:25 UTC 2025 test result
😢 FAILURE (click triangle for details)
Reason
EESSI test suite was not run, test step itself failed to execute.
Details
✅ job output file slurm-75438.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@gpu-bot-ugent
Copy link

gpu-bot-ugent bot commented Jul 11, 2025

Label bot:build has been set by user TopRichard, but this person does not have permission to trigger builds

@TopRichard
Copy link
Contributor Author

bot: build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws arch:x86_64/amd/zen3 accel:nvidia/cc80

@gpu-bot-ugent
Copy link

gpu-bot-ugent bot commented Jul 11, 2025

Updates by the bot instance eessi-bot-vsc-ugent (click for details)
  • received bot command build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws arch:x86_64/amd/zen3 accel:nvidia/cc80 from TopRichard

    • expanded format: build repository:eessi.io-2023.06-software instance:eessi-bot-mc-aws architecture:x86_64/amd/zen3 accelerator:nvidia/cc80
  • handling command build repository:eessi.io-2023.06-software instance:eessi-bot-mc-aws architecture:x86_64/amd/zen3 accelerator:nvidia/cc80 resulted in:

    • account TopRichard has NO permission to submit build jobs

@eessi-bot-aws
Copy link

eessi-bot-aws bot commented Jul 11, 2025

New job on instance eessi-bot-mc-aws for CPU micro-architecture x86_64-amd-zen3 and accelerator nvidia/cc80 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2025.07/pr_35/75439

date job status comment
Jul 11 19:40:17 UTC 2025 submitted job id 75439 awaits release by job manager
Jul 11 19:41:17 UTC 2025 released job awaits launch by Slurm scheduler
Jul 11 19:46:20 UTC 2025 running job 75439 is running
Jul 12 00:06:57 UTC 2025 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-75439.out
✅ no message matching FATAL:
❌ found message matching ERROR:
❌ found message matching FAILED:
❌ found message matching required modules missing:
❌ no message matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-amd-zen3-17522777590.tar.gzsize: 0 MiB (18148 bytes)
entries: 1
modules under 2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80/modules/all
no module files in tarball
software under 2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80/software
no software packages in tarball
other under 2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80
2023.06/init/easybuild/eb_hooks.py
Jul 12 00:06:57 UTC 2025 test result
😢 FAILURE (click triangle for details)
Reason
EESSI test suite was not run, test step itself failed to execute.
Details
✅ job output file slurm-75439.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@gpu-bot-ugent
Copy link

gpu-bot-ugent bot commented Jul 11, 2025

Label bot:build has been set by user TopRichard, but this person does not have permission to trigger builds

@TopRichard
Copy link
Contributor Author

bot: build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws arch:x86_64/amd/zen3 accel:nvidia/cc80

@gpu-bot-ugent
Copy link

gpu-bot-ugent bot commented Jul 12, 2025

Updates by the bot instance eessi-bot-vsc-ugent (click for details)
  • received bot command build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws arch:x86_64/amd/zen3 accel:nvidia/cc80 from TopRichard

    • expanded format: build repository:eessi.io-2023.06-software instance:eessi-bot-mc-aws architecture:x86_64/amd/zen3 accelerator:nvidia/cc80
  • handling command build repository:eessi.io-2023.06-software instance:eessi-bot-mc-aws architecture:x86_64/amd/zen3 accelerator:nvidia/cc80 resulted in:

    • account TopRichard has NO permission to submit build jobs

@eessi-bot-aws
Copy link

eessi-bot-aws bot commented Jul 12, 2025

New job on instance eessi-bot-mc-aws for CPU micro-architecture x86_64-amd-zen3 and accelerator nvidia/cc80 for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2025.07/pr_35/75648

date job status comment
Jul 12 17:46:21 UTC 2025 submitted job id 75648 awaits release by job manager
Jul 12 17:46:25 UTC 2025 released job awaits launch by Slurm scheduler
Jul 12 17:51:28 UTC 2025 running job 75648 is running
Jul 13 04:51:47 UTC 2025 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-75648.out
✅ no message matching FATAL:
❌ found message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-amd-zen3-17523818030.tar.gzsize: 429 MiB (450023744 bytes)
entries: 18546
modules under 2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80/modules/all
TensorFlow/2.15.1-foss-2023a-CUDA-12.1.1.lua
software under 2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80/software
TensorFlow/2.15.1-foss-2023a-CUDA-12.1.1
other under 2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80
2023.06/init/easybuild/eb_hooks.py
Jul 13 04:51:47 UTC 2025 test result
😢 FAILURE (click triangle for details)
Reason
EESSI test suite was not run, test step itself failed to execute.
Details
✅ job output file slurm-75648.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@gpu-bot-ugent
Copy link

gpu-bot-ugent bot commented Jul 12, 2025

Label bot:build has been set by user TopRichard, but this person does not have permission to trigger builds

@TopRichard
Copy link
Contributor Author

The build process is fails because of permission issues as show in the log:

  >> generating module file @ /cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80/modules/all/TensorFlow/2.15.1-foss-2023a-CUDA-12.1.1.lua
== Running post-module hook...
== ... (took 3 secs)
== permissions...
== ... (took 15 secs)
== packaging...
== ... (took < 1 sec)
  >> running shell command:
        bzip2 /cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80/software/TensorFlow/2.15.1-foss-2023a-CUDA-12.1.1/easybuild/easybuild-TensorFlow-2.15.1-20250713.042659.log
        [started at: 2025-07-13 04:27:19]
        [working dir: /eessi_bot_job]
        [output and state saved to /tmp/eb-roc_08u5/eb-6wr594yv/run-shell-cmd-output/bzip2-rq8_wa7q]
  >> command completed: exit 0, ran in 00h11m42s
== Summary:
   * [FAILED]  TensorFlow/2.15.1-foss-2023a-CUDA-12.1.1
ERROR: Installation of TensorFlow-2.15.1-foss-2023a-CUDA-12.1.1.eb failed: "Failed to copy file /cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen3/software/EasyBuild/5.1.1/easybuild/easyconfigs/t/TensorFlow/TensorFlow-2.1.0_fix-cuda-build.patch to /cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80/software/TensorFlow/2.15.1-foss-2023a-CUDA-12.1.1/easybuild/TensorFlow-2.1.0_fix-cuda-build.patch: [Errno 13] Permission denied: '/cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80/software/TensorFlow/2.15.1-foss-2023a-CUDA-12.1.1/easybuild/TensorFlow-2.1.0_fix-cuda-build.patch'"

@bedroge
Copy link
Contributor

bedroge commented Jul 13, 2025

Oh, crap, that's probably because of this bug that I introduced and which @ocaisa discovered (easybuilders/easybuild-framework#4959). Not sure if we can easily solve that without patching the EasyBuild installation...

@TopRichard TopRichard marked this pull request as ready for review July 13, 2025 09:10
@laraPPr
Copy link
Contributor

laraPPr commented Jul 14, 2025

bot: build instance:eessi-bot-vsc-ugent repo:eessi.io-2023.06-software arch:zen3 accel:nvidia/cc80

@gpu-bot-ugent
Copy link

gpu-bot-ugent bot commented Jul 14, 2025

Updates by the bot instance eessi-bot-vsc-ugent (click for details)
  • received bot command build instance:eessi-bot-vsc-ugent repo:eessi.io-2023.06-software arch:zen3 accel:nvidia/cc80 from laraPPr

    • expanded format: build instance:eessi-bot-vsc-ugent repository:eessi.io-2023.06-software architecture:zen3 accelerator:nvidia/cc80
  • handling command build instance:eessi-bot-vsc-ugent repository:eessi.io-2023.06-software architecture:zen3 accelerator:nvidia/cc80 resulted in:

@gpu-bot-ugent
Copy link

gpu-bot-ugent bot commented Jul 14, 2025

New job on instance eessi-bot-vsc-ugent for CPU micro-architecture x86_64-amd-zen3 and accelerator nvidia/cc80 for repository eessi.io-2023.06-software in job dir /scratch/gent/vo/002/gvo00211/SHARED/jobs/2025.07/pr_35/15516080

date job status comment
Jul 14 09:02:06 UTC 2025 submitted job id 15516080 awaits release by job manager
Jul 14 09:03:25 UTC 2025 released job awaits launch by Slurm scheduler
Jul 14 09:05:29 UTC 2025 running job 15516080 is running
Jul 14 17:04:18 UTC 2025 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-15516080.out
✅ no message matching FATAL:
❌ found message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-x86_64-amd-zen3-17525125090.tar.gzsize: 431 MiB (452232735 bytes)
entries: 18546
modules under 2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80/modules/all
TensorFlow/2.15.1-foss-2023a-CUDA-12.1.1.lua
software under 2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80/software
TensorFlow/2.15.1-foss-2023a-CUDA-12.1.1
other under 2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80
2023.06/init/easybuild/eb_hooks.py
Jul 14 17:04:18 UTC 2025 test result
😢 FAILURE (click triangle for details)
Reason
EESSI test suite was not run, test step itself failed to execute.
Details
✅ job output file slurm-15516080.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@laraPPr
Copy link
Contributor

laraPPr commented Jul 15, 2025

Error on the Ghent bot

ERROR: Installation of TensorFlow-2.15.1-foss-2023a-CUDA-12.1.1.eb failed: "Failed to copy file /cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen3/software/EasyBuild/5.1.1/easybuild/easyconfigs/t/TensorFlow/TensorFlow-2.1.0_fix-cuda-build.patch to /cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80/software/TensorFlow/2.15.1-foss-2023a-CUDA-12.1.1/easybuild/TensorFlow-2.1.0_fix-cuda-build.patch: [Errno 13] Permission denied: '/cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80/software/TensorFlow/2.15.1-foss-2023a-CUDA-12.1.1/easybuild/TensorFlow-2.1.0_fix-cuda-build.patch'"

Last EasyBuild log file copied from /tmp/eb-ti45hz1z/easybuild-4tfihxgn.log to /eessi_bot_job

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants